Search Results: "Ian Wienand"

5 February 2011

Ian Wienand: Comcast self-setup with Netgear routers

I just got a Zoom 5341 modem to replace a very sketchy old Motorola model and so far it seems to work fine. However, the Comcast self-install switch process was not seamless. After you plug your new modem in the process starts fine, capturing your session and prompting you for your account. However at the next step it prompts you to download the Windows or Mac only client, directing you to an address http://cdn/.... It is at this point you can get stuck if you're behind a Netgear router (I have a WNDR3700) and probably others. The simple problem is that, for whatever reason, the factory firmware in this router does not pass on the domain search suffix through its inbuilt DHCP client. Without that, cdn doesn't resolve to anything, so you can't download the self-help tool and you're effectively stuck at a "can not find this page" screen. If you google at this point you can find many people who have hit this problem, and various information from erroneous suggestions that you've been hacked (?) to most people just giving up and moving into Comcast phone support hell. Assuming you now don't have internet access, you can complete the process with a quick swizzle. Somebody might be able to correct me on this, but I don't think you want to run the self-install tool from an actual computer plugged directly into the modem if you have a router in the picture, because the wrong MAC address will get registered. So, your best solution is to turn everything off, plug your computer directly into the modem, turn it on, get an address, download the tool, turn everything off, plug the router back-in to the modem and then run the self install tool. At this point, everything just worked fine for me. Netgear should probably fix this by correctly passing through the domain search suffixes in their routers. Comcast should probably fix this by doing some sort of geo-ip lookup to give clients a fully-qualified address in that webpage to download the tool even if their router is broken (or really, does downloading that file really require a content-delivery network?). You can probably fix this by running Openwrt on your router before you start. Otherwise, the Zoom modem and the Netgear WNDR3700 seem to make a very nice combination which I would reccommend.

22 January 2011

Ian Wienand: Adding a javascript tracker to your Docbook document

If you convert a docbook document in a chunked HTML form, it would presumably be very useful for you to place one of the various available javascript based trackers on each of your pages to see what is most popular, linked to etc. I'm assuming you've already got to the point where you have your DSSSL stylesheet going (see something like my prior entry). It doesn't appear you can get arbitrary output into the <head> tag -- the %html-header-tags% variable is really just designed for META tags and there doesn't appear to be anything else in the standard stylesheets to override. So the trick is to use $html-body-start$ . But you have to be a little bit tricky to actually get your javascript to slip past the SGML parser. After several attempts and a bit of googling I finally ended up with the following for a Google Analytics tracker, using string-append to get the javascript comment filtering in:

(define ($html-body-start$)
  (make element gi: "script"
  attributes: '(("language" "javascript")
  	            ("type" "text/javascript"))
		    (make formatting-instruction 
		      data: (string-append "<" "!--
 var _gaq = _gaq   [];
  _gaq.push(['_setAccount', 'UA-XXXXXXXXXX-1']);
  _gaq.push(['_trackPageview']);
  (function()  
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
   )();
// --" ">"))))

3 January 2011

Ian Wienand: easygeotag.info

I think I've just about finished my Thanksgiving project easygeotag.info.

I'm a little bit obsessive about geotagging my photos and while I know there are many photo management solutions out there that can do it in various ways, I generally find it quicker and easier to use exiv2 and simple shell scripts to embed the location info directly into my files. I've tried a number of things that have never worked out better than simply using Google Maps and panning around to find locations I even bought a GPS tracker which would supposedly automatically tag my photos; assuming of course it could ever get a GPS lock, it hadn't run out of batteries and corrupted its filesystem, you had all times in sync and could figure out the various timezone issues, daylight savings changes, etc etc. I always feel safer having all my metadata embedded in the actual files just incase Yahoo ever does a del.icio.us to Flickr (I use a little Python script with IPTC bindings for comments, which I then backup similarly obsessively locally and to Amazon S3). The site is fairly simple in concept it allows you to search for locations, easily extract the geotag info and provides the ability to save frequently used locations for easy reference. Mostly it was an exercise for me to implement something after reading the excellent Javascript patterns with YUI3, Google App Engine and OpenID all of which of which I managed to cram in. Although the audience may be limited (maybe just to me :) I hope someone else finds it useful for managaing their memories! If you think this might be useful and would like the output in some other format, just let me know.

19 November 2010

Ian Wienand: Symbol Versions and dependencies

The documentation on ld's symbol versioning syntax is a little bit vague on "dependencies", which it talks about but doesn't give many details on. Let's construct a small example:

$ cat foo.c

#include <stdio.h>
#ifndef VERSION_2
void foo(int f)  
     printf("version 1 called\n");
 
#else
void foo_v1(int f)  
     printf("version 1 called\n");
 
__asm__(".symver foo_v1,foo@VERSION_1");
void foo_v2(int f)  
     printf("version 2 called\n");
 
/* i.e. foo_v2 is really foo@VERSION_2
 * @@ means this is the default version
 */
__asm__(".symver foo_v2,foo@@VERSION_2");
#endif

$ cat 1.ver

VERSION_1  
      global:
      foo;
      local:
        *;
 ;

$ cat 2.ver

VERSION_1  
      local:
        *;
 ;
VERSION_2  
      foo;
  VERSION_1;

$ cat main.c

#include <stdio.h>
void foo(int);
int main(void)  
    foo(100);
    return 0;

$ cat Makefile

all: v1 v2
libfoo.so.1 : foo.c
	    gcc -shared -fPIC -o libfoo.so.1 -Wl,--soname='libfoo.so.1' -Wl,--version-script=1.ver foo.c
libfoo.so.2 : foo.c
	    gcc -shared -fPIC -DVERSION_2 -o libfoo.so.2 -Wl,--soname='libfoo.so.2' -Wl,--version-script=2.ver foo.c
v1: main.c libfoo.so.1
    ln -sf libfoo.so.1 libfoo.so
    gcc -Wall -o v1 -lfoo -L. -Wl,-rpath=. main.c
v2: main.c libfoo.so.2
    ln -sf libfoo.so.2 libfoo.so
    gcc -Wall -o v2 -lfoo -L. -Wl,-rpath=. main.c
.PHONY: clean
clean:
	rm -f libfoo* v1 v2

$ ./v1

version 1 called

$ ./v2

version 2 called

In words, we create two libraries; a version 1 and a version 2, where we provide a new version of foo in the version 2 library. The soname is set in the libraries, so v1 and v2 can distinguish the correct library to use. In the updated 2.ver version, we say that VERSION_2 depends on VERSION_1. So, the question is, what does this mean? Does it have any effect? We can examine the version descriptors in the library and see that there is indeed a relationship recorded there.

$ readelf --version-info ./libfoo.so.2

[...]
Version definition section '.gnu.version_d' contains 3 entries:
  Addr: 0x0000000000000264  Offset: 0x000264  Link: 5 (.dynstr)
  000000: Rev: 1  Flags: BASE   Index: 1  Cnt: 1  Name: libfoo.so.2
  0x001c: Rev: 1  Flags: none  Index: 2  Cnt: 1  Name: VERSION_1
  0x0038: Rev: 1  Flags: none  Index: 3  Cnt: 2  Name: VERSION_2
  0x0054: Parent 1: VERSION_1

Looking at the specification we can see that each version definition has a vd_aux field which is a linked list of, essentially, strings that give "the version or dependency name". This is a little vague for a specification, however it appears to mean that the first entry is the name of the version specification, and any following elements are your dependencies. At least, this is how readelf interprets it when it shows you the "Parent" field in the output above. This implies something that the ld documentation doesn't mention; in that you may list multiple dependencies for a version node. That does work, and readelf will just report more parents if you try it. So the question is, what does this dependency actually do? Well, as far as I can tell, nothing really. The dynamic loader doesn't look at the dependency information; and doesn't have any need to it is looking to resolve something specific, foo@VERSION_2 for example, and doesn't really care that VERSION_1 even exists. ld does enforce the dependency, in that if you specify a dependent node but leave it out or accidentally erase it, the link will fail. However, it doesn't really convey anything other than its intrinsic documenation value.

21 October 2010

Ian Wienand: Logging POST requests with Apache

After getting a flood of spam, I became suspicious that there was an exploit in my blog software allowing easy robo-posts. Despite a code audit I couldn't see anything, and thus wanted to log the incoming POST requests before any local processing at all. It took me a while to figure out how to do this, hopefully this helps someone else. Firstly install libapache-mod-security, then the magic incarnation is

SecRuleEngine On
SecAuditEngine on
SecAuditLog /var/log/apache2/website-audit.log
SecRequestBodyAccess on
SecAuditLogParts ABIFHZ
SecDefaultAction "nolog,noauditlog,allow,phase:2"
SecRule REQUEST_METHOD "^POST$" "chain,allow,phase:2"
SecRule REQUEST_URI ".*" "auditlog"

So, to break it down a little, the default action says to do nothing during phase 2 (when the body is available for inspection); the allow means that we're indicating that nothing further will happen in any of the remaining phases, so the module can shortcut through them. The two SecRules work together -- the first says that any POST requests should be tested by the next rule (i.e. the chained rule), which in this case says that any request should be sent to the audit log. After that, the similar allow/phase argument again says that nothing further is going to happen in any of the subsequent phases mod_security can work on. As per the parts between A and Z, we'll log the headers, the request body, the final response and trailer. So, as it turns out, there is no exploit; it seems most likely there is an actual human behind the spam that gets through, because every time they take a guess it is correct. So I guess I'll take a glass-half-full kind of approach and rather than being annoyed at removing the spam, I'll just convince myself that I made a small donation from some spam overlord to one of their poor minions!

24 August 2010

Ian Wienand: Split debugging info -- symbols

In a previous post I mentioned split debugging info. One addendum to this is how symbols are handled. Symbols are separate to debugging info (i.e. the stuff about variable names, types, etc you get when -g is turned on), but necessary for a good debugging experience. You have a choice, however, of where you leave the symbol files. You can leave them in your shipping binary/library so that users who don't have the full debugging info available will still get a back-trace that at least has function names. The cost is slightly larger files for everyone, even if the symbols are never used. This appears to be what Redhat does with it's system libraries, for example. The other option is to keep the symbols in the .debug files along-side the debug info. This results in smaller libraries, but really requires you to have the debug info files available to have workable debugging. This appears to be what Debian does. So, how do you go about this? Well, it depends on what tools you're using. For binutils strip, there is some asynchronicity between the --strip-debug and --only-keep-debug options. --strip-debug will keep the symbol table in the binary, and --only-keep-debug will also keep the symbol table.

$ gcc -g -o main main.c
$ readelf --sections ./main   grep symtab
  [36] .symtab           SYMTAB          00000000 000f48 000490 10     37  53  4
$ cp main main.debug
$ strip --only-keep-debug main.debug 
$ readelf --sections ./main.debug   grep symtab
  [36] .symtab           SYMTAB          00000000 000b1c 000490 10     37  53  4
$ strip --strip-debug ./main
$ readelf --sections ./main.debug   grep symtab
  [36] .symtab           SYMTAB          00000000 000b1c 000490 10     37  53  4

Of course, you can then strip (with no arguments) the final binary to get rid of the symbol table; but other than manually pulling out the .symtab section with objcopy I'm not aware of any way to remove it from the debug info file. Constrast with elfutils; more commonly used on Redhat based system I think. eu-strip's --strip-debug does the same thing; leaves the symtab section in the binary. However, it also has a -f option, which puts any removed sections during the strip into a separate file. Therefore, you can create any combination you wish; eu-strip -f results in an empty binary with symbols and debug data in the .debug file, while eu-strip -g -f results in debug data only in the .debug file, and symbol data retained in the binary. The only thing to be careful about is using eu-strip -g -f and then further stripping the binary, and consequently destroying the symbol table, but retaining debug info. This can lead to some strange things in backtraces:

$ gcc -g -o main main.c
$ eu-strip -g -f main.debug main
$ strip ./main
$ gdb ./main
GNU gdb (GDB) 7.1-debian
...
(gdb) break foo
Breakpoint 1 at 0x8048397: file main.c, line 2.
(gdb) r
Starting program: /home/ianw/tmp/symtab/main
Breakpoint 1, foo (i=100) at main.c:2
      2return i + 100;
(gdb) back
#0  foo (i=100) at main.c:2
#1  0x080483b1 in main () at main.c:6
#2  0x423f1c76 in __libc_start_main (main=Could not find the frame base for "__libc_start_main".
) at libc-start.c:228
#3  0x08048301 in ?? ()

Note one difference between strip and eu-strip is that binutils strip will leave the .gnu_debuglink section in, while eu-strip will not:

$ gcc -g -o main main.c
$ eu-strip -g -f main.debug main
$ readelf --sections ./main  grep debuglink
  [29] .gnu_debuglink    PROGBITS        00000000 000bd8 000010 00      0   0  4
$ eu-strip main
$ readelf --sections ./main  grep debuglink
$ gcc -g -o main main.c
$ eu-strip -g -f main.debug main
$ strip main
$ readelf --sections ./main  grep debuglink
  [27] .gnu_debuglink    PROGBITS        00000000 0005d8 000010 00      0   0  4

11 May 2010

Ian Wienand: A short tour of TERM

I'd wager probably more people today don't know what the TERM environment variable really does than do (yes, everyone who used to use ed over a 300-baud acoustic-coupler is laughing, but times have changed!). I recently got hit by what was, at the time, a strange bug. Consider the following trivial Python program, that uses curses and issues the "hide the cursor" command.

$ cat curses.py
import curses
curses.initscr()
curses.curs_set(0)
curses.nocbreak()
curses.echo()
curses.endwin()

If you run it in your xterm or linux console, nothing should happen. But change your TERM to vt102 and it won't work:

$ TERM=vt102 python ./test.py
Traceback (most recent call last):test-curses.py
  File "./test-curses.py", line 5, in <module>
    curses.curs_set(0)
_curses.error: curs_set() returned ERR

(obvious when you explicitly reset the term, no so obvious when some script is doing it behind your back -- as I can attest!) So, what really happened there? The curs_set man page gives us the first clue:

curs_set returns the previous cursor state, or ERR if the requested visibility is not supported.

How do we know if visibility is supported? For that, curses consults the terminfo library. This is a giant database streching back to the dawn of time that says what certain terminals are capable of, and how to get them to do whatever it is you want them to do. So the TERM environment variable tells the terminfo libraries what database to use (see /usr/share/terminfo/*) when reporting features to people who ask, such as ncurses. man 5 terminfo tells us:

cursor_invisible civis vi make cursor invisible

So curses is asking terminfo how to make the cursor invisible, and getting back "that's not possible here", and hence throwing our error. But how can a person tell if their current terminal supports this feature? infocmp will echo out the features and what escape codes are used to access them. So we can check easily:

$ echo $TERM
xterm
$ infocmp   grep civis
					   bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, civis=\E[?25l,
$ TERM=vt102 infocmp   grep civis
$

So we can now clearly see that the vt102 terminal doesn't support civis, and hence has no way to make the cursor invisible; hence the error code. If you're ever looking for a good read, check out the termcap/terminfo master file. The comments are like a little slice of the history of computing!

10 May 2010

Ian Wienand: Cook's Illustrated v Food52 Cookie Challenge

I saw Christopher Kimball, doyen of the Cook's Illustrated empire, at our local bookstore a while ago and, as one of my old professors would say, he was "good value". He did, however, have a bit of a rant about the internet and how random websites just did not produce recipes that could compare to the meticulous testing that a Cook's Illustrated recipe went through. I was interested to see this come to a head on Slate where Cook's Illustrated has put their recipes up against food52.com for a head-to-head battle. So yesterday, my wife and I took up the challenge. We resolved to follow both recipes meticulously, which I believe we achieved after a trip for ingredients. Neither was particularly easier or harder than the other -- the Cook's Illustrated had a lot of fiddling with spices, while the food52.com one required creaming the butter and sugar. Both came out pretty much as you would expect, although the Cook's Illustrated ones were a little flat. The recipe does say to be careful to not overwork the dough; it may be partially user error. We were split. I liked the Cook's Illustrated one better, as the spices really were quite mellow and very enjoyable with a cup of coffee. My wife tended towards the plainer food52.com ones, but she is a big fan of a plain sugar cookie. The ultimate test, however, was leaving both of them on the bench at work with a request to vote. The winner was clear -- 10 votes for Cook's Illustrated and only 3 for food52.com. So, maybe Kimball has a point. Either way, when there's cookies, everyone's a winner! Some photos of the results: <object height="300" width="400"> <param name="flashvars" value="offsite=true&lang=en-us&page_show_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623913498245%2Fshow%2F&page_show_back_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623913498245%2F&set_id=72157623913498245&jump_to="> <param name="movie" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649"> <param name="allowFullScreen" value="true"><embed allowfullscreen="true" flashvars="offsite=true&lang=en-us&page_show_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623913498245%2Fshow%2F&page_show_back_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623913498245%2F&set_id=72157623913498245&jump_to=" height="300" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" type="application/x-shockwave-flash" width="400"></embed></object>

9 May 2010

Ian Wienand: This laptop has Super Cow Powers

Tip: if you would like your own cow sticker, take a cute child through the checkouts at your local Whole Foods.

6 May 2010

Ian Wienand: How much slower is cross compiling to your own host?

The usual case for cross-compiling is that your target is so woefully slow and under-powered that you would be insane to do anything else. However, sometimes for one of the best reasons of all, "historical reasons", you might ship a 64-bit product but support building on 32-bit hosts, and thus cross-compile even on a very fast architecture like x86. How much does this cost, even though almost everyone is running the 32-bit cross-compiler on a modern 64-bit machine? To test, I got a a 32-bit cross and a 64-bit native x86_64 compiler and toolchain; in this case based on gcc-4.1.2 and binutils 2.17. I then did a allyesconfig build of Linux 2.6.33 x86_64 kernel 3 times using the cross compilier toolchain and then native one. The results (in seconds):

32-bit	64-bit
6090	5684
6050	5616
6063	5652
average
6067	5650

So, all up, ~7% less by building your 64-bit code on a 64-bit machine with a 32-bit cross-compiler.

22 March 2010

Ian Wienand: What exactly does -Bsymblic do? -- update

Some time ago I wrote a description of the -Bsymbolic linker flag which could do with some further explanation. The original article is a good starting place. One interesting point that I didn't go into was the potential for code optimisation -Bsymbolic brings about. I'm not sure if I missed that at the time, or the toolchain changed, both are probably equally likely! Let me recap the example...

ianw@jj:/tmp/bsymbolic$ cat Makefile 
all: test test-bsymbolic
clean:
	rm -f *.so test testsym
liboverride.so : liboverride.c
	       $(CC) -Wall -O2 -shared -fPIC -o liboverride.so $<
libtest.so : libtest.c
	   $(CC) -Wall -O2 -shared -fPIC -o libtest.so $<
libtest-bsymbolic.so : libtest.c
		     $(CC) -Wall -O2 -shared -fPIC -Wl,-Bsymbolic -o $@ $<
test : test.c libtest.so liboverride.so
     $(CC) -Wall -O2 -L. -Wl,-rpath=. -ltest -o $@ $<
test-bsymbolic : test.c libtest-bsymbolic.so liboverride.so
	$(CC) -Wall -O2 -L. -Wl,-rpath=. -ltest-bsymbolic -o $@ $<

$ cat liboverride.c 
#include <stdio.h>
int foo(void)
 
	printf("override foo called\n");
	return 0;

$ cat libtest.c 
#include <stdio.h>
int foo(void)  
    printf("libtest foo called\n");
    return 1;
 
int test_foo(void)  
    return foo();

$ cat test.c
#include <stdio.h>
int test_foo(void);
int main(void)
 
	printf("%d\n", test_foo());
	return 0;

In words; libtest.so provides test_foo(), which calls foo() to do the actual work. libtest-bsymbolic.so is simply built with the flag in question, -Bsymbolic. liboverride.so provides the alternative version of foo() designed to override the original via a LD_PRELOAD of the library. test is built against libtest.so, test-bsymbolic against libtest-bsymbolic.so. Running the examples, we can see that the LD_PRELOAD does not override the symbol in the library built with -Bsymbolic.

$ ./test
libtest foo called
1
$ ./test-bsymbolic 
libtest foo called
1
$ LD_PRELOAD=liboverride.so ./test
override foo called
0
$ LD_PRELOAD=liboverride.so ./test-bsymbolic 
libtest foo called
1

There are a couple of things going on here. Firstly, you can see that the SYMBOLIC flag is set in the dynamic section, leading to the dynamic linker behaviour I explained in the original article:

ianw@jj:/tmp/bsymbolic$ readelf --dynamic ./libtest-bsymbolic.so 
Dynamic section at offset 0x550 contains 22 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x00000010 (SYMBOLIC)                   0x0
...

However, there is also an effect on generated code. Have a look at the PLTs:

$ objdump --disassemble-all ./libtest.so
Disassembly of section .plt:
[... blah ...]
0000039c <foo>:
 39c:   ff a3 10 00 00 00       jmp    *0x10(%ebx)
 3a2:   68 08 00 00 00          push   $0x8
 3a7:   e9 d0 ff ff ff          jmp    37c <_init+0x30>

$ objdump --disassemble-all ./libtest-bsymbolic.so
Disassembly of section .plt:
00000374 <__gmon_start__@plt-0x10>:
 374:   ff b3 04 00 00 00       pushl  0x4(%ebx)
 37a:   ff a3 08 00 00 00       jmp    *0x8(%ebx)
 380:   00 00                   add    %al,(%eax)
        ...
00000384 <__gmon_start__@plt>:
 384:   ff a3 0c 00 00 00       jmp    *0xc(%ebx)
 38a:   68 00 00 00 00          push   $0x0
 38f:   e9 e0 ff ff ff          jmp    374 <_init+0x30>
00000394 <puts>:
 394:   ff a3 10 00 00 00       jmp    *0x10(%ebx)
 39a:   68 08 00 00 00          push   $0x8
 39f:   e9 d0 ff ff ff          jmp    374 <_init+0x30>
000003a4 <__cxa_finalize@plt>:
 3a4:   ff a3 14 00 00 00       jmp    *0x14(%ebx)
 3aa:   68 10 00 00 00          push   $0x10
 3af:   e9 c0 ff ff ff          jmp    374 <_init+0x30>

Notice the difference? There is no PLT entry for foo() when -Bsymbolic is used. Effectively, the toolchain has noticed that foo() can never be overridden and optimised out the PLT call for it. This is analogous to using "hidden" attributes for symbols, which I have detailed in another article on symbol visiblity attributes (which also goes into PLT's, if the above meant nothing to you). So -Bsymbolic does have some more side-effects than just setting a flag to tell the dynamic linker about binding rules -- it can actually result in optimised code. However, I'm still struggling to find good use-cases for -Bsymbolic that can't be better done with Version scripts and visibility attributes. I would certainly recommend using these methods if at all possible. Thanks to Ryan Lortie for comments on the original article.

16 March 2010

Ian Wienand: Handling hostnames, UDP and IPv6 in Python

So, you have some application where you want the user to specify a remote host/port, and you want to support IPv4 and IPv6. For literal addresses, things are fairly simple. IPv4 addresses are simple, and RFC2732 has things covered by putting the IPv6 address within square brackets. It gets more interesting as to what you should do with hostnames. The problem is that getaddrinfo can return you multiple addresses, but without extra disambiguation from the user it is very difficult to know which one to choose. RFC4472 discusses this, but there does not appear to be any good solution. Possibly you can do something like ping/ping6 and have a separate program name or configuration flag to choose IPv6. This comes at a cost of transparency. The glibc implementation of getaddrinfo() puts considerable effort into deciding if you have an IPv6 interface up and running before it will return an IPv6 address. It will even recognise link-local addresses and sort addresses more likely to work to the front of the returned list as described here. However, there is still a small possibility that the IPv6 interface doesn't actually work, and so the library will sort the IPv6 address as first in the returned list when maybe it shouldn't be. If you are using TCP, you can connect to each address in turn to find one that works. With UDP, however, the connect essentially does nothing. So I believe probably the best way to handle hostnames for UDP connections, at least on Linux/glibc, is to trust getaddrinfo to return the sanest values first, try a connect on the socket anyway just for extra security and then essentially hope it works. Below is some example code to do that (literal address splitter bit stolen from Python's httplib).

import socket
DEFAULT_PORT = 123
host = '[fe80::21c:a0ff:fb27:7196]:567'
# the port will be anything after the last :
p = host.rfind(":")
# ipv6 literals should have a closing brace
b = host.rfind("]")
# if the last : is outside the [addr] part (or if we don't have []'s
if (p > b):
    try:
        port = int(host[p+1:])
    except ValueError:
        print "Non-numeric port"
        raise
    host = host[:p]
else:
    port = DEFAULT_PORT
# now strip off ipv6 []'s if there are any
if host and host[0] == '[' and host[-1] == ']':
    host = host[1:-1]
print "host = <%s>, port = <%d>" % (host, port)
the_socket = None
res = socket.getaddrinfo(host, port, socket.AF_UNSPEC, socket.SOCK_DGRAM)
# go through all the returned values, and choose the ipv6 one if
# we see it.
for r in res:
    af,socktype,proto,cname,sa = r
    try:
        the_socket = socket.socket(af, socktype, proto)
        the_socket.connect(sa)
    except socket.error, e:
        # connect failed!  try the next one
        continue
    break
if the_socket == None:
    raise socket.error, "Could not get address!"
# ready to send!
the_socket.send("hi!")

5 March 2010

Ian Wienand: RFC3164 smells

From RFC3164, which is otherwise about syslog formats:

6. Security Considerations An odor may be considered to be a message that does not require any acknowledgement. People tend to avoid bad odors but are drawn to odors that they associate with good food. The acknowledgement of the receipt of the odor or scent is not required and indeed it may be the height of discretion to totally ignore some odors. On the other hand, it is usually considered good civility to acknowledge the prowess of the cook merely from the ambiance wafting from the kitchen. Similarly, various species have been found to utilize odors to attract mates. One species of moth uses this scent to find each other. However, it has been found that bolas spiders can mimic the odor of the female moths of this species. This scent will then attract male moths, which will follow it with the expectation of finding a mate. Instead, when they arrive at the source of the scent, they will be eaten [8]. This is a case of a false message being sent out with inimical intent. ... Along the lines of the analogy, computer event messages may be sent accidentally, erroneously and even maliciously.

This smells more like "I bet nobody ever really reads this RFC, let's put some stuff in the middle to see if they do!".

26 February 2010

Ian Wienand: DIG Jazz Applet, V3

The ABC overhauled DIG Jazz (now I think it's just called "ABC Jazz") and upgraded from the oh-so 2008 XML playlist to a much more web-cool JSON one. Hence Version 3 (source) of the applet. Now with improved HTML escaping and different colors. DIG Jazz now-playing Gnome applet V3

Check out The Dilworths while you're there!

25 February 2010

Ian Wienand: Building a quiet, cool media/house server

After getting sick of having to underclock my existing home server to get it to remain up for any period of time, along with the horrendous noise, I finally found the time and budget to rebuild. My goals were:

Quiet enough to have in the living room
Cool enough to live inside the TV cabinet with little ventilation
Large, redundant storage
Powerful enough to re-encode video
As power efficient as possible

In the end I went with

Antec NSK 1380
Asus AT3N7A-I motherboard with a dual-core Atom 330 CPU and Nvidia chipset
4GB of RAM (for which I am still waiting for the mail-in rebate, as usual!)
1 Western Digital Green 1TB SATA disk
1 Samsung Ecogreen 1TB SATA disk
30GB Patriot SATA SSD
Recycled DVD writer + PCI IDE card

The case is really awesome. Very easy to access, and the fan is extremley quiet. It very cleverly holds 3 full-size hard-disks; two mounted vertically on either side, and one horizontally in the middle. The final space is for a DVD -- after that it's pretty cramped inside! It makes claims it is a very efficient power supply. It has a very bright blue LED on the front, and the face-plate over the DVD looks nice but the button doesn't quite reach the eject button on my drive, so it's software eject only. All over, definitely recommend. The motherboard is fairly good. One annoying thing is that it only has 3 SATA ports -- I think it's reasonable to want to have a primary drive, two mirrored large disks plus a DVD for a nice little media server. It also has no parallel-IDE, which is reasonable these days. However, if you wanted to install a wireless card you'd be out of luck if you also wanted to put in another SATA card, as it only has one PCI slot. There are plenty of USB ports for a USB wireless card, however. It also comes with two SATA cables, which isn't mentioned anywhere I could see (hopefully this saves you a trip back to Fry's to return your extra cables :). The CPU fan is a little on the loud side, as mentioned in some other forums. It is also located right where the power supply cables come down in this case, making for a fairly tight fit. Here's a lspci for those interested in such things

00:00.0 Host bridge: nVidia Corporation MCP79 Host Bridge (rev b1)
00:00.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.0 ISA bridge: nVidia Corporation MCP79 LPC Bridge (rev b2)
00:03.1 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.2 SMBus: nVidia Corporation MCP79 SMBus (rev b1)
00:03.3 RAM memory: nVidia Corporation MCP79 Memory Controller (rev b1)
00:03.5 Co-processor: nVidia Corporation MCP79 Co-processor (rev b1)
00:04.0 USB Controller: nVidia Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:04.1 USB Controller: nVidia Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:06.0 USB Controller: nVidia Corporation MCP79 OHCI USB 1.1 Controller (rev b1)
00:06.1 USB Controller: nVidia Corporation MCP79 EHCI USB 2.0 Controller (rev b1)
00:08.0 Audio device: nVidia Corporation MCP79 High Definition Audio (rev b1)
00:09.0 PCI bridge: nVidia Corporation MCP79 PCI Bridge (rev b1)
00:0b.0 IDE interface: nVidia Corporation MCP79 SATA Controller (rev b1)
00:10.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
00:15.0 PCI bridge: nVidia Corporation MCP79 PCI Express Bridge (rev b1)
01:05.0 RAID bus controller: Silicon Image, Inc. PCI0680 Ultra ATA-133 Host Controller (rev 02)
02:00.0 VGA compatible controller: nVidia Corporation ION VGA (rev b1)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 03)

I know there are issues with the controller of the Patriot SSD, which would probably worry me for a general purpose machine. However for the primary disk of a server machine I'm not too fussed. The silence is golden and cool-running and low power consumption really helps too. I wouldn't really say it seems that blazing fast. It comes with a handy mount so it fits in a full-size hard-disk slot. Hard to beat for the price. One concession I made to avoid excessive writes was mounting /tmp on tmpfs; this is where it's nice to have 4GB of RAM. Handbrake seems to do a lot of work in /tmp for example. The two green hard-disks (software mirrored) are also inaudible. I'm hoping different brands should at least not fail at the same time. I'm not sure if they're auto-spinning down, but you can't really tell if they're on or not, even under load. Even when running flat-out re-encoding a DVD in the enclosed cabinet with minimal airflow the CPU temperature hasn't gone above 60C; normal CPU temp is around 40C. Performance is adequate -- running PlayOn in a VMware workstation XP VM almost works. It's kind of hard to take photos inside a case, but here's an attempt: <object height="300" width="400"> <param name="flashvars" value="offsite=true&lang=en-us&page_show_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623509152266%2Fshow%2F&page_show_back_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623509152266%2F&set_id=72157623509152266&jump_to="> <param name="movie" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649"> <param name="allowFullScreen" value="true"><embed allowfullscreen="true" flashvars="offsite=true&lang=en-us&page_show_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623509152266%2Fshow%2F&page_show_back_url=%2Fphotos%2Fiwienand%2Fsets%2F72157623509152266%2F&set_id=72157623509152266&jump_to=" height="300" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" type="application/x-shockwave-flash" width="400"></embed></object> Debian unstable installed without issues (except for some weird bug with the boot hanging for 60 seconds due to the RTL driver). It is also very strange installing from USB to an SSD no noise!

21 January 2010

Ian Wienand: Separate debug info

I've recently found out a bit more about separating debug info, and thought a consolidated reference might be handy. The Idea Most every distribution now provides separate debug packages which contain only the debug info, saving much space for the 99% of people who never want to start gdb. This is achieved with objcopy and --only-keep-debug/--add-gnu-debuglink and is well explained in the man page. What does this do? This adds a .gnu_debuglink section to the binary with the name of debug file to look for.

$ gcc -g -shared -o libtest.so libtest.c
$ objcopy --only-keep-debug libtest.so libtest.debug
$ objcopy --add-gnu-debuglink=libtest.debug libtest.so
$ objdump -s -j .gnu_debuglink libtest.so
libtest.so:     file format elf32-i386
Contents of section .gnu_debuglink:
 0000 6c696274 6573742e 64656275 67000000  libtest.debug...
 0010 52a7fd0a                             R...

The first part is the name of the file, the second part is a check-sum of debug-info file for later reference. Build ID Did you know that binaries also get stamped with a unique id when they are built? The ld --build-id flag stamps in a hash near the end of the link.

$ readelf --wide --sections ./libtest.so    grep build
  [ 1] .note.gnu.build-id NOTE            000000d4 0000d4 000024 00   A  0   0  4
$ objdump -s -j .note.gnu.build-id libtest.so 
libtest.so:     file format elf32-i386
Contents of section .note.gnu.build-id:
 00d4 04000000 14000000 03000000 474e5500  ............GNU.
 00e4 a07ab0e4 7cd54f60 0f5cf66b 5799b05c  .z.. .O .\.kW..\
 00f4 2d43f456                             -C.V

Incase you're wondering what the format of that is...

uint32 name_size; /* size of the name */
uint32 hash_size; /* size of the hash */
uint32 identifier; /* NT_GNU_BUILD_ID == 0x3 */
char   name[name_size]; /* the name "GNU" */
char   hash[hash_size]; /* the hash */

Although the actual file may change (due to prelink or similar) the hash will not be updated and remain constant. Finding the debug info files The last piece of the puzzle is how gdb attempts to find the debug-info files when it is run. The main variable influencing this is debug-file-directory.

(gdb) show debug-file-directory 
The directory where separate debug symbols are searched for is "/usr/lib/debug".

The first thing gdb does, which you can verify via an strace, is search for a file called [debug-file-directory]/.build-id/xx/yyyyyy.debug; where xx is the first two hexadecimal digits of the hash, and yyy the rest of it:

$ objdump -s -j .note.gnu.build-id /bin/ls
/bin/ls:     file format elf32-i386
Contents of section .note.gnu.build-id:
 8048168 04000000 14000000 03000000 474e5500  ............GNU.
 8048178 c6fd8024 2a11673c 7c6a5af6 2c65b1b5  ...$*.g< jZ.,e..
 8048188 d7e13fd4                             ..?.            
... [running gdb /bin/ls] ...
access("/usr/lib/debug/.build-id/c6/fd80242a11673c7c6a5af62c65b1b5d7e13fd4.debug", F_OK) = -1 ENOENT (No such file or directory)

Next it moves onto the debug-link info filename. First it looks for the filename in same directory as the object being debugged. After that it looks for the file in a sub-directory called .debug/ in the same directory. Finally, it prepends the debug-file-directory to the path of the object being inspected and looks for the debug info there. This is why the /usr/lib/debug directory looks like the root of a file-system; if you're looking for the debug-info of /usr/lib/libfoo.so it will be looked for in /usr/lib/debug/usr/lib/libfoo.so. Interestingly, the sysroot and solib-search-path don't appear to have anything to do with these lookups. So if you change the sysroot, you also need to change the debug-file-directory to match. However, most distributions make all this "just work", so hopefully you'll never have to worry about anyway!

11 January 2010

Ian Wienand: Salton (Sim)City

I was recently driving through the California desert and came across the Salton Sea. Long story short - it rained a lot and the Colorado River overflowed a bunch of dams and dikes meant to contain it and created a huge inland sea. Oops. Some enterprising souls must have decided that despite the lack of any natural flushing dooming the sea to a salty, polluted existence, there was ripe opportunity to create a sea-side metropolis. From the ground, it is a bit of a fun ghost town to explore. The typical "everything just abandoned" type thing. But when I came to geotag some photos I took there, I was quite astonished to see this. Salton (Sim)City

That looks exactly like what I used to do in SimCity. I'd use the F-U-N-D-S cheat at the start to max out my money, then build my little empire with neat roads and school and harbours and whatnot they've even got an airport! Then I'd press "go" and people would slowly move in to the residential areas, one house on one block at a time. I guess poor old Salton City never made it past "turtle speed"!

8 January 2010

Ian Wienand: vi backup files considered harmful

Mark this one down as another in the long list of "duh" once you realise what is going on! Bug report comes in about a long running daemon that has stopped logging. lsof reports the log file is now named logfile~ and further more is deleted! This happens after a system upgrade scenario, so of course I go off digging through a multitude of scripts and what-not to find the culprit... Have you got it yet? Try this...

# lsof   grep syslogd   grep messages
syslogd    1376        root   15w      REG        3,1    99851    4605625 /var/log/messages

# cd /var/log/
# vi messages (and save the file)

root@jj:/var/log# lsof   grep syslogd   grep messages
syslogd    1376        root   15w      REG        3,1    99851    4605625 /var/log/messages~ (deleted)

vi is very careful and renames your existing file, so that if anything goes wrong when writing the new version you can get something back. It's a shame the daemon doesn't know about this! The kernel is happy to deal with the rename, but when the backup file is unlinked you're out of luck. Confusingly to a casual inspection your log file looks like it's there ... just that nothing is going into it. (oh, and if you tried that, you might like to restart syslogd now :) Moral of the story -- overcome that finger-memory and never use vi on a live file; you're asking for trouble!

23 December 2009

Ian Wienand: Stripping shared libraries

So, how to strip a shared library? --strip-unneeded states that it removes all symbols that are not needed for relocation processing. This is a little cryptic, because one might reasonably assume that a shared library can be "relocated", in that it can be loaded anywhere. However, what this really refers to is object files that are usually built and bundled into a .a archive for static linking. For an object in an static library archive to still be useful, global symbols must be kept, although static symbols can be removed. Take the following small example:

$ cat libtest.c
static int static_var = 100;
int global_var = 100;
static int static_function(void)  
       return static_var;
 
int global_function(int i)  
    return static_function() + global_var + i;

Before stripping:

$ gcc -c -fPIC -o libtest.o libtest.c
$ readelf --symbols ./libtest.o
Symbol table '.symtab' contains 18 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
...
     5: 00000000     4 OBJECT  LOCAL  DEFAULT    5 static_var
     6: 00000000    22 FUNC    LOCAL  DEFAULT    3 static_function
    13: 00000004     4 OBJECT  GLOBAL DEFAULT    5 global_var
    16: 00000016    36 FUNC    GLOBAL DEFAULT    3 global_function

After stripping:

$ strip --strip-unneeded libtest.o
$ readelf --symbols ./libtest.o 
Symbol table '.symtab' contains 15 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
...
    10: 00000004     4 OBJECT  GLOBAL DEFAULT    5 global_var
    13: 00000016    36 FUNC    GLOBAL DEFAULT    3 global_function

If you --strip-all from this object file, it will remove the entire .symtab section and will be useless for further linking, because you'll never be able to find global_function to call it!. Shared libraries are different, however. Shared libraries keep global symbols in a separate ELF section called .dynsym. --strip-all will not touch the dynamic symbol entires, and thus it is therefore safe to remove all the "standard" symbols from the output file, without affecting the usability of the shared library. For example, readelf will still show the .dynsym symbols even after stripping:

$ gcc -shared -fPIC -o libtest.so libtest.c
$ strip --strip-all ./libtest.so 
$ readelf  --syms ./libtest.so 
Symbol table '.dynsym' contains 11 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
...
     6: 00000452    36 FUNC    GLOBAL DEFAULT   12 global_function
    10: 000015e0     4 OBJECT  GLOBAL DEFAULT   21 global_var

However, --strip-unneeded is smart enough to realise that a shared-object library doesn't need the .symtab section as well and remove it. So, conclusions? --strip-all is safe on shared libraries, because global symbols remain in a separate section, but not on objects for inclusion in static libraries (relocatable objects). --strip-unneeded is safe for both, and automatically understands shared objects do not need any .symtab entries to function and removes them; effectively doing the same work as --strip-all. So, --strip-unneeded is essentially the only tool you need for standard stripping needs! See also

Debian Policy Manual

22 December 2009

Ian Wienand: An open letter to the bald-headed salesman in Harvey Norman, Norwest, Castle Hill

I usally find blog rants useless, but sometimes something is just so annoying one is sufficiently inspired. Today I went with my parents to buy them a Tivo at Harvey Norman, Norwest, Castle Hill, NSW, Australia. I am a big Tivo fan; the interface is good and it "just works". I don't mind paying for (or in this recommending paying for) good products. After selecting the Tivo model, I asked for a HDMI cable. The salesman made a series of questions about what sort of HD TV it was being plugged into; I quickly sensed this as a probe to see what sort of suckers we were, and requested just a "normal" cable. At this point, he insisted on a $130 (you guessed it) Monster cable, and had the audacity to say that we didn't need one of the really expensive cables because our TV wasn't good enough! I openly expressed my concern, but the annoying high-pressure sales pitch had just begun. The amount of, frankly, crap that he spewed about 4-bit this, 10-bit that, legislating of labels, DA signal levels, mythical customers who regretted buying the cheap cables and who knows what else was to the point of being comical if it weren't so insistent and said with such seeming authority. There is only one thing that matters - if the cable has passed the functional requirements for being certified to have the distinctive HDMI logo plastered on it. From the HDMI FAQ:

Q. What testing is required? Prior to mass producing or distributing any Licensed Product or component that claims compliance with the HDMI Specification (or allowing someone else to do such activities), each Adopter must test a representative sample for HDMI compliance. First, the Adopter must self test as specified in the then-current HDMI Compliance Test Specification. The HDMI Compliance Test Specification provides a suite of testing procedures, and establishes certain minimum requirements specifying how each HDMI Adopter should test Licensed Products for conformance to the HDMI Specification.

Now, I can understand that if you buy any old HDMI cable off Ebay for $1, it may be a knock-off that uses the HDMI logo illegally. But there is no way that the certified $50 Philips cable (still very over-priced, but at least not insane, and discounted to $35) performs any differently to some overpriced Monster model certified to exactly the same standard. The thing that annoyed me most was his analogy to buying a tyre. He stated that "if you walked up to a tyre salesman and I said don't want the Pirelli's, just put the cheap-o tires on my Ferrari" I'd be insane, and thus by extension of that logic I was insane for not buying a Monster cable for my great new Tivo. This analogy is completely flawed and really just dishonest. A Ferrari is much more powerful and goes much faster than a standard car. It is plausible it needs a better engineered tyre to perform adequately given the additional stresses it undergoes. A Tivo doesn't put out any more or any less bits than any other HDMI certified equipment, no matter what you do. If the cable is certified as getting all the bits to the other end under whatever environmental conditions specified by the HDMI people, then it's going to work for the 99% of people with normal requirements. Nobody wants to make a significant investment in a piece of audio-visual equipment and feel they are getting something that isn't optimal. Harvey Norman's use of this understandable consumer sentiment to sell ridiculously over-priced cables that do nothing is extremely disappointing. I'm sure the commissions on these things encourage this behaviour, so it is useless expecting the retailer or individual sales assistant to change their policy to reccommend reasonably priced cables. However, it is really Tivo and other manufactuers who get the raw end of this deal; a $130 cable is over 20% of the price the actual Tivo! That is surely affecting people's purchasing decisions. If Tivo and others included a certified HDMI cable with their device, as they do with component cables, and had "Certified HDMI 1.3 cable included" plastered on the box, it would be a harder sell to explain why the manufactuer would bother shipping a certified cable that is supposedly insufficient, and consumers would hopefully avoid the very distasteful high-pressure theatrics I was subjected to today.

Next.

Previous.